Sequence Alignment as Hypothesis Testing

نویسندگان

  • Lu Meng
  • Fengzhu Sun
  • Xuegong Zhang
  • Michael S. Waterman
چکیده

Sequence alignment depends on the scoring function that defines similarity between pairs of letters. For local alignment, the computational algorithm searches for the most similar segments in the sequences according to the scoring function. The choice of this scoring function is important for correctly detecting segments of interest. We formulate sequence alignment as a hypothesis testing problem, and conduct extensive simulation experiments to study the relationship between the scoring function and the distribution of aligned pairs within the aligned segment under this framework. We cut through the many ways to construct scoring functions and showed that any scoring function with negative expectation used in local alignment corresponds to a hypothesis test between the background distribution of sequence letters and a statistical distribution of letter pairs determined by the scoring function. The results indicate that the log-likelihood ratio scoring function is statistically most powerful and has the highest accuracy for detecting the segments of interest that are defined by the statistical distribution of aligned letter pairs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum Likelihood Estimation and Hypothesis Testing

This chapter is a brief introduction to two important statistical methods— maximum likelihood estimation and hypothesis testing. We shall show how to use these methods to test the biological sequence models developed in previous chapters against experimental data. We shall also show how hypothesis testing ideas inspire scoring methods for sequence alignment. denote the outcome of an experiment ...

متن کامل

Maximum Likelihood Estimation and Hypothesis Testing

This chapter is a brief introduction to two important statistical methods— maximum likelihood estimation and hypothesis testing. We shall show how to use these methods to test the biological sequence models developed in previous chapters against experimental data. We shall also show how hypothesis testing ideas inspire scoring methods for sequence alignment. denote the outcome of an experiment ...

متن کامل

Quality estimation of multiple sequence alignments by Bayesian hypothesis testing

UNLABELLED In this work we present a web-based tool for estimating multiple alignment quality using Bayesian hypothesis testing. The proposed method is very simple, easily implemented and not time consuming with a linear complexity. We evaluated method against a series of different alignments (a set of random and biologically derived alignments) and compared the results with tools based on clas...

متن کامل

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

HandAlign: Bayesian multiple sequence alignment, phylogeny and ancestral reconstruction

UNLABELLED We describe handalign, a software package for Bayesian reconstruction of phylogenetic history. The underlying model of sequence evolution describes indels and substitutions. Alignments, trees and model parameters are all treated as jointly dependent random variables and sampled via Metropolis-Hastings Markov chain Monte Carlo (MCMC), enabling systematic statistical parameter inferenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 18 5  شماره 

صفحات  -

تاریخ انتشار 2011